Hands-on Exercise 5a - Creating Ternary Plot with R

Author

Goh Si Hui

Published

February 7, 2024

Modified

February 9, 2024

1 About this Exercise

In this exercise, we will learn how to build ternary plot programmatically using R for visualising and analysing population structure of Singapore.

What is ternary plot?

Ternary plots are a way of displaying the distribution and variability of three-part compositional data. For example, in this exercise, we will have proportions of population: (1) aged, (2) economy active and (3) young. The plot is displayed in a triangle with sides scaled from 0 to 1. Each side represents one of the three components. A point is plotted so that a line drawn perpendicular from the point to each leg of the triangle intersect at the component values of the point.

We will see more of it later!

2 Getting Started

Before we start, let us ensure that the required R packages have been installed and import the relevant data for this hands-on exercise.

2.1 Installing and Loading the Packages

For this exercise, other than tidyverse (in particular readr, dplyr and tidyr), we will use the following packages:

  • ggtern: a ggplot extension specially designed to plot ternary diagrams. We will use this to plot static ternary plots.

  • plotly R: to create interactive web-based graphs based on plotly’s JavaScript graphing library, plotly.js. We will make use of plotly R library’s ggplotly() function to convert ggplot2 figures into a plotly object.

The code chunk below uses p_load() of pacman package to check if the abovementioned packages are installed in the computer. If they are, they will be launched in R. Otherwise, pacman will install the relevant packages before launching them.

Show the code
pacman::p_load(tidyverse, plotly, ggtern, DT)

2.2 Importing the Data

For this exercise, we will be using the Singapore Residents by Planning Area Subzone, Age Group, Sex and Type of Dwelling (June 2000-2018) data from Singstats. The course instructor has provided the downloaded data respopagsex2000to2018_tidy.csv in csv file format.

The following code chunk uses read_csv() function of readr package to import the data into R.

Show the code
popdata <- read_csv("data/respopagsex2000to2018_tidy.csv")
datatable(popdata)
glimpse(popdata)
Rows: 108,126
Columns: 5
$ PA         <chr> "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio", "An…
$ SZ         <chr> "Ang Mo Kio Town Centre", "Ang Mo Kio Town Centre", "Ang Mo…
$ AG         <chr> "AGE0-4", "AGE0-4", "AGE0-4", "AGE0-4", "AGE0-4", "AGE0-4",…
$ Year       <dbl> 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2011, 2012,…
$ Population <dbl> 290, 270, 260, 250, 260, 250, 200, 180, 290, 290, 270, 300,…

Note that the data has information from 2000 to 2018. In addition, each row tells us the number of residents for a particular population age group in a certain planning area subzone for a particular year. This current format is not useful for the ternary plot that we are going to make.

2.3 Preparing the Data

As such, we use the mutate() function of dplyr package to:

  1. change the year from numerical to character

  2. derive three new measures: young, active, and old using spread()

  3. filter only those data from year 2018 and with values more than 0.

Show the code
agpop_mutated <- popdata %>%
  mutate(`Year` = as.character(Year))%>%
  spread(AG, Population) %>%
  mutate(YOUNG = rowSums(.[4:8]))%>%
  mutate(ACTIVE = rowSums(.[9:16]))  %>%
  mutate(OLD = rowSums(.[17:21])) %>%
  mutate(TOTAL = rowSums(.[22:24])) %>%
  filter(Year == 2018)%>%
  filter(TOTAL > 0)
datatable(agpop_mutated)
glimpse(agpop_mutated)
Rows: 234
Columns: 25
$ PA         <chr> "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio", "An…
$ SZ         <chr> "Ang Mo Kio Town Centre", "Cheng San", "Chong Boon", "Kebun…
$ Year       <chr> "2018", "2018", "2018", "2018", "2018", "2018", "2018", "20…
$ `AGE0-4`   <dbl> 180, 1060, 900, 720, 220, 550, 260, 830, 160, 810, 350, 282…
$ `AGE05-9`  <dbl> 270, 1080, 900, 850, 310, 630, 340, 930, 160, 1070, 460, 32…
$ `AGE10-14` <dbl> 320, 1080, 1030, 1010, 380, 670, 430, 930, 220, 1300, 490, …
$ `AGE15-19` <dbl> 300, 1260, 1220, 1120, 500, 780, 500, 860, 260, 1450, 400, …
$ `AGE20-24` <dbl> 260, 1400, 1380, 1230, 550, 950, 640, 1020, 350, 1500, 330,…
$ `AGE25-29` <dbl> 300, 1880, 1760, 1460, 500, 1080, 690, 1400, 340, 1590, 310…
$ `AGE30-34` <dbl> 270, 1940, 1830, 1330, 300, 990, 440, 1350, 230, 1390, 310,…
$ `AGE35-39` <dbl> 330, 2300, 1920, 1540, 290, 1100, 400, 1700, 250, 1770, 630…
$ `AGE40-44` <dbl> 430, 2090, 1900, 1700, 420, 1140, 490, 1720, 260, 1860, 810…
$ `AGE45-49` <dbl> 470, 2180, 1910, 1830, 550, 1230, 580, 1530, 320, 2000, 830…
$ `AGE50-54` <dbl> 360, 2160, 2070, 1880, 550, 1350, 640, 1480, 300, 1980, 620…
$ `AGE55-59` <dbl> 310, 2150, 2140, 1810, 560, 1420, 730, 1720, 360, 2010, 460…
$ `AGE60-64` <dbl> 300, 2270, 2170, 1750, 450, 1290, 680, 1680, 350, 1980, 390…
$ `AGE65-69` <dbl> 270, 2130, 2050, 1700, 410, 1150, 500, 1610, 250, 1790, 340…
$ `AGE70-74` <dbl> 190, 1370, 1570, 1240, 290, 830, 280, 1190, 160, 1090, 220,…
$ `AGE75-79` <dbl> 150, 980, 1170, 870, 220, 680, 210, 980, 100, 690, 110, 257…
$ `AGE80-84` <dbl> 60, 560, 640, 540, 140, 360, 180, 560, 70, 390, 80, 1520, 2…
$ AGE85over  <dbl> 60, 440, 530, 430, 140, 340, 130, 460, 60, 310, 100, 1350, …
$ YOUNG      <dbl> 1330, 5880, 5430, 4930, 1960, 3580, 2170, 4570, 1150, 6130,…
$ ACTIVE     <dbl> 2770, 16970, 15700, 13300, 3620, 9600, 4650, 12580, 2410, 1…
$ OLD        <dbl> 730, 5480, 5960, 4780, 1200, 3360, 1300, 4800, 640, 4270, 8…
$ TOTAL      <dbl> 4830, 28330, 27090, 23010, 6780, 16540, 8120, 21950, 4200, …

3 Plotting Ternary Diagram with R

3.1 Static Ternary Diagram

We can use ggtern() function of ggtern package to create a simple static ternary plot.

Show the code
ggtern(data = agpop_mutated, 
       aes(x = YOUNG, y = ACTIVE, z = OLD)) + 
  geom_point()

We can further customise the ternary chart by adding titles using labs() from ggplot2 and themes from ggtern package.

For the list of themes provided by ggtern, please refer to here.

Show the code
ggtern(agpop_mutated,
       aes(x=YOUNG, y=ACTIVE, z=OLD)) +
  geom_point() +
  labs(title = "Population Struction 2018") +
  theme_rgbg()

3.2 Interactive Ternary Diagram

To create an interactive ternary plot, we will be using plot_ly() function of Plotly R. It consists several steps: 1. we will first create a function for creating annotation object. 2. Then we will create a function for axis formating 3. Then we will create a plotly visualisation!

First, we will create a function for the label. In this label, we specify the font size, font color, label background color and border width of the label.

Show the code
label <- function(txt){
  list(
    text = txt,
    x = 0.1, 
    y = 0.1,
    ax = 0,
    ay = 0,
    xref="paper",
    yref = "paper",
    align = "center",
    font = list(family = "serif", size = 15, color = "white"),
    bgcolor = "#760241", bordercolor = "black", borderwidth = 2)
}

Then we will create a function for the axis formatting.

Show the code
axis <- function(txt){
  list(title = txt, tickformat = ".0%", tickfont = list(size=10))
}

Using the axis function created, we will create labels for the axes on the ternary plot.

Show the code
ternaryaxes <- list(
  aaxis = axis("Young"),
  baxis = axis("Active"), 
  caxis = axis("Old")
)

Now we can plot the ternary chart using the scatterternary chart type in plot_ly.

Show the code
plot_ly(
  agpop_mutated,
  a = ~YOUNG,
  b = ~ACTIVE,
  c = ~OLD,
  color = I("black"),
  type = "scatterternary"
) %>%
  layout(annotations = label("Ternary Markers"),
         ternary = ternaryaxes)